Automated labeling of bibliographic data extracted from biomedical online journals

نویسندگان

  • Jongwoo Kim
  • Daniel X. Le
  • George R. Thoma
چکیده

A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, abstract, affiliation and others) from online biomedical journals to populate the National Library of Medicine’s MEDLINE® database. This paper describes a key module in this system: the labeling module that employs statistics and fuzzy rule-based algorithms to identify segmented zones in an article’s HTML pages as specific bibliographic data. Results from experiments conducted with 1,149 medical articles from forty-seven journal issues are presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Labeling Of Biomedical Online Journal Articles

An automated labeling (AL) module has been developed to automate the extraction of bibliographic data (e.g., article title, authors, affiliation, abstract, and others) from online biomedical journals for the National Library of Medicine’s MEDLINE database. The AL module employs string matching, statistics, and fuzzy rule-based algorithms to identify segmented zones in an article’s HTML pages a...

متن کامل

Automated Labeling Algorithms for Biomedical Document Images

The National Library of Medicine (NLM) has developed an automated system, named Medical Article Records System (MARS), to process bibliographic data (title, authors, affiliation, abstract, etc.) in biomedical journal articles for its MEDLINE database. This paper describes a labeling module in the MARS, which automatically extract the bibliographic data in biomedical journal articles. The label...

متن کامل

Automated Labeling from Biomedical Journals published in Foreign Languages

An automated labeling (AL) module is developed to produce bibliographic records such as English title, vernacular title, author, affiliation, and English abstract from biomedical articles published in foreign language journals. Optical character recognition (OCR) output from scanned biomedical journals is used in this labeling process. Since frequently occurring words in a zone are important fe...

متن کامل

Automated labeling in document images

The National Library of Medicine (NLM) is developing an automated system to produce bibliographic records for its MEDLINE database. This system, named Medical Article Record System (MARS), employs document image analysis and understanding techniques and optical character recognition (OCR). This paper describes a key module in MARS called the Automated Labeling (AL) module, which labels all zon...

متن کامل

Automated Cleanup Processing for Extracting Bibliographic Data from Biomedical Online Journals

An R&D division of the National Library of Medicine (NLM) has developed the Web-based Medical Article Records System (WebMARS) to create citations from online biomedical journals. This paper presents one important part of this system, the automated cleanup module that extracts bibliographic information from HTML-formatted text based on a rule-based approach. A learning scheme comparing the outp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003